**----------------------------------------------------------------------------**
Replication file
Project: Local Corruption & Household Business Tax Compliance
Publication Journal: Journal of Economic Behavior and Organization (July 2020)
Authors: Duong T. Le, Eddy Malesky, Anh Pham
Date: July 2020
Software: STATA SE 15.1; ArcGIS for Desktop 10.5
**----------------------------------------------------------------------------**

**---Datasets---**

Below are the list of raw datasets that we use to create differnent datasets for our paper and their orginal sources

1.  Vietnam Household Business Census 2017 (VHBS2017_original.dta)- confidential. 
	We obtained this dataset from the General Statistics Office (GSO) of Vietnam.
	Readers interested in replicating the project can contact GSO to purchase the data.  

2. "pairwise_full_GIS_allObs.dta" and "pairwise_full_GIS_onlyID.dta"; available in the "data/sub_data" folder: 
		_This is the pairwise commune data constructed from Vietnam's admin3 country boundary shapefile (polygon) downloaded from www.gadm.org.
		We use ArcGIS for Desktop sofware to process the dataset. 
		_All spatial variables (e.g., distance to provincial border, neighboring status, size, location) are derived from a combination of spatial tools: Near, Intersection, Buffer, and Select by Location.
		_All PCI variables are collected from https://pcivietnam.vn/en and merged by provincial IDs. 

3. "pci.dta" in the data" folder: 
	 The Provincial Competitiveness Index (PCI)'s provincial scores, for three years 2017 (main), 2016 & 2015 (robustness). 
	 PCI data is publicaly available and can be downloaded directly from the PCI website at: https://pcivietnam.vn/en/pci-data. 

4. "land_character" in the data/sub_data" folder:
	Include geographical characteristics of communes in Vietnam. We downloaded this data from the data available for published paper
	"'The Historical State, Local Collective Action, and Economic Development in Vietnam" by Dell et al. , 2018 Econometrica. 

5. "econ_zone" in the "data/sub_data" folder:
	This data reflects the most recent effort of authors to collect data on industrial parks, economic zones, and communes that 
	receive differential treatment on corporate income taxes as of 2017. Note that the data might be incomplete. 

6. "enterprise2014.dta"- confidential. (once obtained, insert in the "data/sub_data" folder):
	This is the enterprise dataset in 2017 by the Vietnamese General Statistics Office (GSO). We  purchase the dataset from the GSO. 
	Readers interested in replicating the project can contact GSO to obtain the data.

7. "light2016.dta" in the "data/sub_data" folder:
	We download the night-time light radiance data suite (NASA's annual composite) in raster format from https://eogdata.mines.edu/download_dnb_composites.html
	We use ArcGIS for Desktop software to compute 2016 average nightlight radiance for each commune in Vietnam. 	

8.  "pairwise_full_GIS_AtoA_3km.dta" and "pairwise_full_GIS_AtoA_3km_onlyID.dta" in "data" folder:
	 _These are the pairwise commune data for robustness check exercise. The communes in a pair are located in two contiguous province but are within 3km from one another.
	 Prepared by ArcGIS for Desktop software. 
	_All spatial variables (e.g., distance to provincial border, neighboring status, size, location) are derived from a combination of spatial tools: Near, Intersection, Buffer, and Select by Location.
	_All PCI variables are collected from https://pcivietnam.vn/en and merged by provincial IDs. 

9.  "commune-data-robust.dta" in "data" folder:
	This data consists of all communes and their distances to all provincial borders. Used for the bunching test (Appendix). 
	Prepared by ArcGIS for Desktop software. 


**---Do-files---**
"hhb_data_construction.do" constructs the final regression datasets, inputting various raw data products (stored in the "sub_data" folder)
"hhb_data_construction_appendix.do" constructs additional datasets that are used for analyses in the Appendix

"hhb_tables.do" produces all tables in the main text
"hhb_figures.do" produces all figures in the paper
"hhb_appendix.do" produces all appendix tables 

Please first run the "hhb_data_construction.do", then run ""hhb_data_construction_appendix.do". You can run "hhb_tables.do", "hhb_figures.do", and "hhb_appendix.do" in whichever order
